Overview

Dataset Statistics

Number of Variables 14
Number of Rows 19158
Missing Cells 20733
Missing Cells (%) 7.7%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 13.2 MB
Average Row Size in Memory 721.4 B
Variable Types
  • Numerical: 3
  • Categorical: 11

Dataset Insights

gender has 4508 (23.53%) missing values Missing
enrolled_university has 386 (2.01%) missing values Missing
education_level has 460 (2.4%) missing values Missing
major_discipline has 2813 (14.68%) missing values Missing
company_size has 5938 (30.99%) missing values Missing
company_type has 6140 (32.05%) missing values Missing
last_new_job has 423 (2.21%) missing values Missing
city_development_index is skewed Skewed
city has a high cardinality: 123 distinct values High Cardinality

Variables


enrollee_id

numerical

Approximate Distinct Count 19158
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 299.3 KB
Mean 16875.3582
Minimum 1
Maximum 33380
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • enrollee_id is skewed left (γ1 = -0.0184)

Quantile Statistics

Minimum 1
5-th Percentile 1782.85
Q1 8554.25
Median 16982.5
Q3 25169.75
95-th Percentile 31782.15
Maximum 33380
Range 33379
IQR 16615.5

Descriptive Statistics

Mean 16875.3582
Standard Deviation 9616.2926
Variance 9.2473e+07
Sum 3.233e+08
Skewness -0.01839
Kurtosis -1.1962
Coefficient of Variation 0.5698

city

categorical

Approximate Distinct Count 123
Approximate Unique (%) 0.6%
Missing 0
Missing (%) 0.0%
Memory Size 1.3 MB
  • The largest value (city_103) is over 1.61 times larger than the second largest value (city_21)

Length

Mean 7.5103
Standard Deviation 0.5084
Median 8
Minimum 6
Maximum 8

Sample

1st row city_103
2nd row city_40
3rd row city_21
4th row city_115
5th row city_162

Letter

Count 76632
Lowercase Letter 76632
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 48092
  • The largest value (city_103) is over 1.61 times larger than the second largest value (city_21)

city_development_index

numerical

Approximate Distinct Count 93
Approximate Unique (%) 0.5%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 299.3 KB
Mean 0.8288
Minimum 0.448
Maximum 0.949
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • city_development_index is skewed left (γ1 = -0.9953)

Quantile Statistics

Minimum 0.448
5-th Percentile 0.624
Q1 0.74
Median 0.903
Q3 0.92
95-th Percentile 0.926
Maximum 0.949
Range 0.501
IQR 0.18

Descriptive Statistics

Mean 0.8288
Standard Deviation 0.1234
Variance 0.01522
Sum 15879.07
Skewness -0.9953
Kurtosis -0.5387
Coefficient of Variation 0.1488
  • city_development_index is not normally distributed (p-value 9.917514244939198e-21)
  • city_development_index has 17 outliers

gender

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 4508
Missing (%) 23.5%
Memory Size 989.8 KB
  • The largest value (Male) is over 10.68 times larger than the second largest value (Female)

Length

Mean 4.182
Standard Deviation 0.5639
Median 4
Minimum 4
Maximum 6

Sample

1st row Male
2nd row Male
3rd row Male
4th row Male
5th row Male

Letter

Count 61267
Lowercase Letter 46617
Space Separator 0
Uppercase Letter 14650
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Male, Female) take over 50.0%
  • The largest value (male) is over 10.68 times larger than the second largest value (female)

relevent_experience

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1.6 MB
  • The largest value (Has relevent experience) is over 2.57 times larger than the second largest value (No relevent experience)

Length

Mean 22.7199
Standard Deviation 0.4491
Median 23
Minimum 22
Maximum 23

Sample

1st row Has relevent exper...
2nd row No relevent experi...
3rd row No relevent experi...
4th row No relevent experi...
5th row Has relevent exper...

Letter

Count 396952
Lowercase Letter 377794
Space Separator 38316
Uppercase Letter 19158
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Has relevent experience, No relevent experience) take over 50.0%

enrolled_university

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 386
Missing (%) 2.0%
Memory Size 1.4 MB
  • The largest value (no_enrollment) is over 3.68 times larger than the second largest value (Full time course)

Length

Mean 13.7919
Standard Deviation 1.3224
Median 13
Minimum 13
Maximum 16

Sample

1st row no_enrollment
2nd row no_enrollment
3rd row Full time course
4th row no_enrollment
5th row Part time course

Letter

Count 235174
Lowercase Letter 230219
Space Separator 9910
Uppercase Letter 4955
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (no_enrollment, Full time course) take over 50.0%
  • The largest value (no_enrollment) is over 2.79 times larger than the second largest value (course)

education_level

categorical

Approximate Distinct Count 5
Approximate Unique (%) 0.0%
Missing 460
Missing (%) 2.4%
Memory Size 1.3 MB
  • The largest value (Graduate) is over 2.66 times larger than the second largest value (Masters)

Length

Mean 8.0785
Standard Deviation 1.5312
Median 8
Minimum 3
Maximum 14

Sample

1st row Graduate
2nd row Graduate
3rd row Graduate
4th row Graduate
5th row Masters

Letter

Count 148727
Lowercase Letter 127704
Space Separator 2325
Uppercase Letter 21023
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Graduate, Masters) take over 50.0%
  • The largest value (graduate) is over 2.66 times larger than the second largest value (masters)

major_discipline

categorical

Approximate Distinct Count 6
Approximate Unique (%) 0.0%
Missing 2813
Missing (%) 14.7%
Memory Size 1.1 MB
  • The largest value (STEM) is over 21.66 times larger than the second largest value (Humanities)

Length

Mean 4.5435
Standard Deviation 1.9598
Median 4
Minimum 4
Maximum 15

Sample

1st row STEM
2nd row STEM
3rd row STEM
4th row Business Degree
5th row STEM

Letter

Count 73714
Lowercase Letter 13343
Space Separator 550
Uppercase Letter 60371
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (STEM, Humanities) take over 50.0%
  • The largest value (stem) is over 21.66 times larger than the second largest value (humanities)

experience

categorical

Approximate Distinct Count 22
Approximate Unique (%) 0.1%
Missing 65
Missing (%) 0.3%
Memory Size 1.2 MB
  • The largest value (>20) is over 2.3 times larger than the second largest value (5)

Length

Mean 1.6542
Standard Deviation 0.7553
Median 1
Minimum 1
Maximum 3

Sample

1st row >20
2nd row 15
3rd row 5
4th row <1
5th row >20

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 27775
  • The largest value (20) is over 2.4 times larger than the second largest value (5)

company_size

categorical

Approximate Distinct Count 8
Approximate Unique (%) 0.1%
Missing 5938
Missing (%) 31.0%
Memory Size 917.2 KB

Length

Mean 6.0486
Standard Deviation 1.6676
Median 6
Minimum 3
Maximum 9

Sample

1st row 50-99
2nd row 50-99
3rd row 50-99
4th row <10
5th row 50-99

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 9893
Decimal Number 66743

company_type

categorical

Approximate Distinct Count 6
Approximate Unique (%) 0.0%
Missing 6140
Missing (%) 32.0%
Memory Size 932.6 KB
  • The largest value (Pvt Ltd) is over 9.81 times larger than the second largest value (Funded Startup)

Length

Mean 8.3556
Standard Deviation 3.4525
Median 7
Minimum 3
Maximum 19

Sample

1st row Pvt Ltd
2nd row Pvt Ltd
3rd row Funded Startup
4th row Funded Startup
5th row Pvt Ltd

Letter

Count 95794
Lowercase Letter 68755
Space Separator 12979
Uppercase Letter 27039
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Pvt Ltd, Funded Startup) take over 50.0%

last_new_job

categorical

Approximate Distinct Count 6
Approximate Unique (%) 0.0%
Missing 423
Missing (%) 2.2%
Memory Size 1.2 MB
  • The largest value (1) is over 2.44 times larger than the second largest value (>4)

Length

Mean 1.6991
Standard Deviation 1.3345
Median 1
Minimum 1
Maximum 5

Sample

1st row 1
2nd row >4
3rd row never
4th row never
5th row 4

Letter

Count 12260
Lowercase Letter 12260
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 16283
  • The top 2 categories (1, >4) take over 50.0%
  • The largest value (1) is over 1.86 times larger than the second largest value (4)

training_hours

numerical

Approximate Distinct Count 241
Approximate Unique (%) 1.3%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 299.3 KB
Mean 65.3669
Minimum 1
Maximum 336
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • training_hours is skewed right (γ1 = 1.8191)

Quantile Statistics

Minimum 1
5-th Percentile 7
Q1 23
Median 47
Q3 88
95-th Percentile 188
Maximum 336
Range 335
IQR 65

Descriptive Statistics

Mean 65.3669
Standard Deviation 60.0585
Variance 3607.0188
Sum 1.2523e+06
Skewness 1.8191
Kurtosis 3.8392
Coefficient of Variation 0.9188
  • training_hours is not normally distributed (p-value 0.0005325410999039795)
  • training_hours has 984 outliers

target

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1.7 MB
  • The largest value (Not looking for job change) is over 3.01 times larger than the second largest value (Looking for a job change)

Length

Mean 25.5013
Standard Deviation 0.8653
Median 26
Minimum 24
Maximum 26

Sample

1st row Looking for a job ...
2nd row Not looking for jo...
3rd row Not looking for jo...
4th row Looking for a job ...
5th row Not looking for jo...

Letter

Count 411922
Lowercase Letter 392764
Space Separator 76632
Uppercase Letter 19158
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Not looking for job change, Looking for a job change) take over 50.0%

Interactions

Correlations

Missing Values